-
Notifications
You must be signed in to change notification settings - Fork 200
Replace ONNX simplification package from onnxsim to onnxslim #478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gcunhase Hi, any update here? Thanks. |
|
@inisis I'm still validating Specifically, please check that the following CLI is still functional and performant: $ python -m modelopt.onnx.quantization --onnx_path=/mnt/models/bevformer_tiny_epoch_24_cp2_op13.onnx \
--trt_plugins=$PLUGIN_PATH \
--op_types_to_exclude MatMul \
--calibration_data_path=/workspace/BEVFormer_tensorrt/data/nuscenes/calib_data.npz \
--simplifyThanks! |
|
Hi, @gcunhase it took me some time to run bevformer-int8-eq, however everything is working fine, here are the results, EnvWithout simplify
With onnxsim
With onnxslim
to conclude:
Well, in terms of GPU Compute Time (median, ms), onnxsim is slightly faster, I compared two models using Onnxslim will merge Matmul + Add into Gemm, this is not in favor when using --op_types_to_exclude MatMul |
Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile. Thanks! |
@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one. |
No problem, let me try to verify the accuracy on my end. Thank you! |
|
Hi @gcunhase , is there any update? Thanks |
|
@inisis we appreciate your contribution and wanted to make sure that there are no regressions before merging this PR. We've investigated potential risks in ~150 models and compiled a list of issues, divided into 3 categories, that would need to be solved before merging. All mentioned models and scripts are in the zip file: repro.zip 1. Functional failuresError logsError 1: repro_io_tensors_shape_dtype.onnx Error 2: repro_mode_error_mobilenetv1.onnx How to reproimport onnx
import onnxslim
model = onnx.load(input_model_path)
simplified_model = onnxslim.slim(model)2. ORT inference failuresError logsError 1: repro_mul_incompatible_dimensions.onnx Error 2: repro_gemm_invalid_shape.onnx How to reproRun the 3. ORT numerical accuracy failuresError logsThe simplified versions of the following models do not produce the same outputs as the original model for the same input data:
How to reproRun the -- |
|
@gcunhase So much appreciation for your comprehensive testing, which has helped us improve onnxslim. All the issues you mentioned have been resolved in version 0.1.75 of onnxslim, and these models have also been added to onnxslim’s daily CI. Many thanks again. Here are some details when solving the issues: 1. Functional failuresIf model is ended with custom opertor as output, onnxslim is unable to do symbolic shape inference for it, so it will lose dtype and shape, we improved it by using the info already stored in the original model. 2. ORT inference failuresIn onnxslim the shape inference for the outputs of resize node is aligned with official onnx documentation where is onnxruntime, the output size if rounded, so there is a mismatch, and in some cases, there will be an incompatible_dimensions issue, now we are aligned with ort. 3. ORT numerical accuracy failuresthere is a precision issue with issue3_repro_conv_resize_issue.onnx the np.array_equal is passed, |
gcunhase
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inisis we appreciate your speedy and detailed reply!
I was able to verify that all cases now pass with v0.1.75 and that disabling layout optimizations in ORT solves the numerical accuracy issue observed in the last model. This is achieved by adding the following line in our comparison script (as you suggested):
session_opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
Approved.
|
@kevalmorabia97 can you please update the CHANGELOG file? Not sure which ModelOpt version this update would be included. My suggestion would something like: Thanks. |
kevalmorabia97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. Great to have better onnx simplification package! Will wait for CICD to pass and then merge
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #478 +/- ##
==========================================
+ Coverage 74.76% 74.78% +0.02%
==========================================
Files 183 183
Lines 18630 18626 -4
==========================================
+ Hits 13929 13930 +1
+ Misses 4701 4696 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@inisis there is conflict installing with torch2.6 |
|
There is a sympy version conflict |
|
Can |
Yes, I will check it asap, I don’t understand why PyTorch needs to pin SymPy to version 1.13.1. |
|
@kevalmorabia97 the latest pytorch requires sympy>=1.13.3 https://github.com/pytorch/pytorch/blob/main/pyproject.toml#L47 There is also a conflict in onnxslim's CI, but it didn't break the pipeline. |
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586) ## What does this PR do? **Type of change:** Refator; Minor new feature **Overview:** ? 1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher - Prepares architecture for additional search methods. 2. seperated quant modules and score modules - separate quantization modules from scoring modules, enabling auto-quantization to measure sensitivity at parent layers (e.g., MLP output for MoE experts) rather than individual ops. 3. Also see NVIDIA#592 and NVIDIA#588 ## Testing See unittests; `tests/unit/torch/quantization/test_autoquant.py` and `tests/unit/torch/quantization/plugins/test_huggingface.py` ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Not Required ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for score modules in quantization workflows. * Added optional naming for quantization recipes. * **Bug Fixes** * Improved quantization grouping rules documentation with clearer configuration examples. * **Refactor** * Renamed quantization module parameters for improved clarity. * Enhanced quantization search architecture for better scalability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: realAsma <[email protected]> Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]> Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
77d6bf9 to
18c27dd
Compare
|
/ok to test 3b1e46c |



What does this PR do?
Type of change:
Add onnxslim support
Overview: Onnxslim is under active development and committed to long-time-support, it's easy to use and is dependent on very few packages.
Usage
Testing
Before your PR is "Ready for review"
Additional Information